There are three primary mechanisms for missing data:
Missing Completely at Random (MCAR): Missingness is entirely random and does not depend on any observed or unobserved features. Eg: a survey respondent skips a question by accident.
Missing at Random (MAR): Missingness depends only on observed variables, not on the values of the missing data itself. For instance, if older respondents are less likely to answer a question about their income, the missingness is MAR because it can be explained by the age variable.
Missing Not at Random (MNAR): Missingness depends on unobserved variables, possibly related to the missing data itself. For example, if individuals with high debt-to-income ratios are less likely to report their income, the missingness is MNAR because it relates to the unobserved debt-to-income ratio.
After selecting four predictors DELINQ, DEROG, DEBTINC, and CLAGE for our model, we examine the mechanisms of missing values within these predictors to determine the most appropriate imputation strategies.

DEROG, DELINQ, DEBTINC, CLAGE), segmented by the BAD outcome (0 for no default, 1 for default). A significant discrepancy in missing data is observed in DEBTINC (debt-to-income ratio), particularly between non-defaulters (BAD = 0) and defaulters (BAD = 1), with defaulters less likely to report their debt-to-income ratio.This pattern suggests that the missingness in DEBTINC is informative and correlates with the likelihood of default. Thus, imputing missing DEBTINC values could introduce bias into the model.

DEROG, DELINQ, DEBTINC, CLAGE, target BAD): The diagonal plots display the count of missing data for each variable. The upper triangle plots shows side-by-side bar plots of missing values counts within and across pairs of variables. The lower triangle uses Venn diagrams to depict the proportion of mutually missing information within each pair.The pattern suggests that missingness in DELINQ (number of delinquent credit lines) and DEROG (number of derogatory reports) is correlated (MAR), with missing values often occurring in tandem. When missingness occurs randomly across both defaulters and non-defaulters, as seen in CLAGE (age of oldest credit line in months), it is categorised as MCAR.
We decided to use the following imputation strategies within SAS Imputation Node based on the missing data mechanisms identified:
DEBTINC- MNAR - No imputation to avoid biasGiven that
DELINQandDEROGare nominal variables - MAR - we use Decision Tree Imputation for Count data within the SAS Imputation Node to capture their MAR characteristics effectively.CLAGE- MCAR - distribution-based imputation
The algorithm has been tested on a subset of the data, and the results are promising. Below figure provides a visual inspection of the distribution of imputed values against the original data.

CLAGE, DEROG, and DELINQ.